Reproducible Manuscripts in R

Princeton University

Jason Geller, PH.D.(he/him)

2024-05-15

Introduction

Packages

library(palmerpenguins) # penguins
library(quarto) # qmd 
library(rmarkdown) # markdown
library(tidyverse) # data wrangling
library(papaja) # apa writing template
install.packages('tinytex') # for use with pdf 
tinytex::install_tinytex()
# to uninstall TinyTeX, run tinytex::uninstall_tinytex() 
  • You should also have Zotero installed along with Better BibTeX (nice, but not necessary)

The Problem

Word

Inside

Word issues

  • A .docx file is a compressed folder with lots of files
    • Your text is buried in with a lot of formatting information
  • Not reproducible
    • Code is divorced from writing
  • Difficult to maintain
    • Errors!
  • What do I share?
    • Lack of transparency

What do we want?

  • Combine narrative with code

  • Automatically generate figures and tables

  • Automatically render results in text

  • Format the content into a scientific paper (including citations!)

  • Something that looks pretty!

  • Rinse & repeat

Hello Quarto!

:::{.columns}

  • Next generation publishing system.
  • Unify and extends the R Markdown ecosystem.
  • Develop and Switch formats without hassle.

The Quarto hexagon logo.

Big universe

  • RMarkdown for EVERYONE

What is a Quarto?

How Quarto Works

Quarto handles literate programming by using a series of programs:

How Quarto Works (Source)

  • knitr executes all code chunks and creates a new markdown (.md) file
  • pandoc takes the markdown file generated and converts it to the desired format.
  • Render inside of RStudio handles the interaction.

Source vs. Visual Mode

Source Editing Mode

Visual Editing Mode

Advantages

(1) Eliminate human error in copying and pasting results

Advantages

In the Wild: Data Science Gone Wrong

  • Retraction Watch by Adam Marcus, Ivan Oransky, and Alison McCook Monitors for authors retracting their paper from a journal.

  • One such case of a paper being retracted due to an Excel error was the Growth in a Time of Debt by Reinhart & Rogoff.

    • The error was found by graduate student Thomas Herndon and co-authors Michael Ash, and Robert Pollin.
    • They published a critique highlighting the error.
    • Herndon appeared on the Colbert Report to discuss their findings

Advantages

(2) Easy revisions and specification of desired figures and tables

When revisions are requested, one might have to tweak tables and figures by hand constantly, leading to a major incentive never to rerun analyses because it would mean re-pasting and re-illustrating all the numbers and figures in a paper.

Advantages

(3) Promote computational reproducibility

  • Easy verification and replication of research findings

  • While programming environments may seem counter-intuitive for writing papers, they ultimately prevent mistakes and save time.

Let’s Get Started!

Getting started

  • Approach 1: Start from scratch (now)

    • Creating a Quarto manuscript

      • RStudio: New Project > New Directory > Quarto Manuscript
  • Note

    Always start a new project folder!

  • Approach 2: Start with a sample template (later)

Overview of a Quarto Document

Create a Quarto Document

In the top left, click the White Plus and select “Quarto Document…”

Creating a new Quarto Document

In the new prompt, enter a title, author name, and press “Create”

New Document Options

Getting started

  • Go to the Getting Started section of website and complete each part
05:00

Annotated Quarto Document

Annotated sections of the “Hello Quarto” document related to document information, text formatting, and code execution

Output of a Quarto Document

Annotated source to output of the “Hello Quarto” document

Metadata & Header (YAML)

---
title: My Reproducible Manuscript
authors:
  - name: Norah Jones
    affiliation: The University
    roles: writing
    corresponding: true
bibliography: references.bib
format: html
---
  • Wait… what’s the YAML acronym?

    • Originally: “Yet Another Markup Language”

    • Later: “YAML Ain’t Markup Language”

  • Set global manuscript options with key-value pairs

Code

```{r}
#| eval: true
1 + 1
```
[1] 2

Text

Section

This is a simple placeholder for the manuscript's main document [@knuth84].

Writing in Markdown (NEXT)